Robot Learning with State-Dependent Exploration
نویسندگان
چکیده
Policy gradient algorithms are among the few learning methods successfully applied to demanding real-world problems including those found in the field of robotics. While Likelihood Ratio (LR) methods are typically used to estimate the gradient, they suffer from high variance due to random exploration at each timestep during the rollout. We therefore evaluate several policy gradient methods with state-dependent exploration (SDE), a recently introduced alternative to random exploration, which deterministically returns the same action for a given state during one episode. We apply SDE to a simulated robotics task with realistically modelled physics, and compare it to random exploration within several different learning schemes. Our experiments show that SDE outperforms traditional random exploration in almost every case.
منابع مشابه
Exploring parameter space in reinforcement learning
This paper discusses parameter-based exploration methods for reinforcement learning. Parameter-based methods perturb parameters of a general function approximator directly, rather than adding noise to the resulting actions. Parameter-based exploration unifies reinforcement learning and black-box optimization, and has several advantages over action perturbation. We review two recent parameter-ex...
متن کاملA Q-learning Based Continuous Tuning of Fuzzy Wall Tracking
A simple easy to implement algorithm is proposed to address wall tracking task of an autonomous robot. The robot should navigate in unknown environments, find the nearest wall, and track it solely based on locally sensed data. The proposed method benefits from coupling fuzzy logic and Q-learning to meet requirements of autonomous navigations. Fuzzy if-then rules provide a reliable decision maki...
متن کاملVisual Tracking using Learning Histogram of Oriented Gradients by SVM on Mobile Robot
The intelligence of a mobile robot is highly dependent on its vision. The main objective of an intelligent mobile robot is in its ability to the online image processing, object detection, and especially visual tracking which is a complex task in stochastic environments. Tracking algorithms suffer from sequence challenges such as illumination variation, occlusion, and background clutter, so an a...
متن کاملRobot Learning: Exploration and Continuous Domains
The goal of this workshop was to discuss two major issues: efficient exploration of a learner's state space, and learning in continuous domains. The common themes that emerged in presentations and in discussion were the importance of choosing one's domain assumptions carefully, mixing controllers/strategies, avoidance of catastrophic failure, new approaches with difficulties with reinforcement ...
متن کاملUpper Confidence Weighted Learning for Efficient Exploration in Multiclass Prediction with Binary Feedback
We introduce a novel algorithm called Upper Confidence Weighted Learning (UCWL) for online multiclass learning from binary feedback. UCWL combines the Upper Confidence Bound (UCB) framework with the Soft Confidence Weighted (SCW) online learning scheme. UCWL achieves state of the art performance (especially on noisy and nonseparable data) with low computational costs. Estimated confidence inter...
متن کامل